AD-A256  284 

llllllllllllllilli 


P»'WP!  ST- 


Simultaneous  DIF  Amplification  and  Cancellation: 
Shealy-S tout’s  Test  for  DIF 


Ratna  Nandakumar1 
Department  of  Educational  Studies 
University  of  Delaware 


DTfC 

electe 

SLP2  91992 


August  15,  1992 


9  28  0  7 1 i 


S2~6P!4~ 


Prepared  for  the  Cognitive  Science  Research  Program,  Cognitive  and  Neural  Sciences 
Division,  Office  of  Naval  Research,  under  grant  number  N00014-90-J-1940,  4421-548.  Ap¬ 
proved  for  public  release,  distribution  unlimited.  Reproduction  in  whole  or  in  part  is 
permitted  for  any  purpose  of  the  United  States  Government. 


1  The  author  would  like  to  convey  special  thanks  to  William  Stout  for  his  insightful 
suggestions  on  this  research  and  to  Louis  Roussos  for  programming  assistance. 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No  Q704-018B 


Photic  'eoorr-nc  burd«n  'Of  tni*  ccuection  of  .^formation  ">  estimated  tc  ave'age  i  sour  oer  'esporse.  including  the  time  for  reviewing  instructions.  searching  easting  aata  sources, 
gathering  arc  maintaining  the  data  needed,  and  ccmoieting  ana  reviewing  the  collection  of  information  Send  comments  regarding  this  burden  estimate  Of  any  other  aspect  of  this 
collection  o*  information,  nciudmg  suggestions  'or  reducing  this  Ouraen  to  Washington  Heaaauarters  Services.  Directorate  tor  information  Operations  ana  Reports.  12’S  Jefferson 
Davis  Highway.  Suite  1 2C-4  Arlington.  .  a  22202-3302.  and  to  the  Off  ice  of  Management  and  Budget.  Paperwork  Reduction  Project  (0704*0 i 88).  Washington.  DC  2C503 


1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE 

15  August  1992 


4.  TITLE  AND  SUBTITLE 


3.  REPORT  TYPE  AND  DATES  COVERED 

Technical : 


5.  FUNDING  NUMBERS 


Simultaneous  DIF  Amplification  and  Cancellation: 
Shealy-Stout1 s  Test  for  DIF 


N00014-90-J-1940, 


6.  AUTHOR(S) 

Ratna  Nandakumar 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Department  of  Statistics 

University  of  Illinois 
725  South  Wright  Street 
Champaign,  IL  61820 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


1992  -  No.  4 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Cognitive  Sciences  Program 
Office  of  Naval  Research 
800  N.  Quincy 
Arl ington,  VA  22217“5000 


10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 


4421-548 


11.  SUPPLEMENTARY  NOTES 

To  be  published  in  Journal  of  Educational  Measurement 


12a.  DISTRIBUTION  AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  words) 

See  reverse 


14.  SUBJECT  TERMS 

See  reverse 


15.  NUMBER  OF  PAGES 

33 


16.  PRICE  CODE 


17  SECURITY  CLASSIFICATION 
OF  REPORT 

unciassi f ied 


18  SECURITY  CLASSIFICATION 
Of  THIS  PAGE 

unclassified 


19.  SECURITY  CLASSIFICATION 
OF  ABSTRACT 

unclassified 


20.  LIMITATION  OF  ABSTRACT 


>o  o ‘ 

Sranoard  form  ;98  '• 

P'^u'iDf’d  bv  >id  t  iQ 

2^8  K2 

Simultaneous  DIF  Amplification  and  Cancellation:  Shealy— Stout’s  Test  for  DIF 

Abstract 

The  present  study  investigates  the  phenomena  of  simultaneous  DIF  amplification 
and  cancellation  and  SIBTEST’s  role  in  detecting  such.  A  variety  of  simulated  test  data 
were  generated  for  this  purpose.  In  addition,  real  test  data  from  various  sources  were  used. 
The  results  from  both  simulated  as  well  as  real  test  data,  as  Shealy  and  Stout’s  theory 
suggests,  show  that  the  SIBTEST  is  effective  in  assessing  the  DIF  amplification  and 
cancellation  (partially  or  fully)  at  the  test  score  level.  Finally,  methodological  and 
substantive  implications  of  DIF  amplification  and  cancellation  are  discussed. 

Subject  terms:  SIBTEST,  DIF,  item  bias,  test  bias,  bias  amplification,  bias  cancellation. 


uric  q'jfdsn  IN&PEOTEL'  3 


Amplification  and  Cancellation  of  DIF  2 


Simultaneous  DIF  Amplification  and  Cancellation:  Shealy— Stout’s  Test  for  DIF 

Studies  of  bias  have  been  widely  prevalent  in  educational  measurement  since  the 
1960s.  Early  attempts  to  study  bias  in  tests  were  largely  based  on  the  notion  of  predictive 
validity.  Consequently,  a  number  of  regression  models  were  developed,  based  on  different 
definitions  of  fairness,  in  order  to  achieve  fair  employment  selection  and  college  admissions 
(Peterson  and  Novick,  1976).  Since  the  advent  of  item  response  theory  (IRT),  however, 
study  of  bias  and  differential  item  functioning  (DIF)  at  the  item  level  has  gained  much 
popularity.  Several  methodologies  have  been  developed  by  various  researchers  to  study 
item  bias  and  DIF  (for  descriptions  and/or  comparisons  of  different  procedures,  see  for 
example,  Angoff,  1982;  Cleary  &  Hilton,  1968;  Dorans  &  Kulick,  1983,  1986;  Hambleton  & 
Rogers,  1989;  Holland  &  Thayer,  1988;  Hunter,  1975;  Ironson,  1982;  Lord,  1980;  Raju, 

1988;  Reynolds,  1982;  Scheuneman,  1979;  Shealy  &  Stout,  1992b;  Shepard,  Camilli,  & 
Averill,  1981;  Swaminathan  &  Rogers,  1990;  Wainer,  Sireci,  &  Thissen,  1991). 

These  procedures  can  usually  be  used  in  an  effort  to  detect  either  item  bias  or  DIF. 
The  subtle  distinction  between  the  closely  related  concepts  of  bias  and  DIF  can  be 
explained  as  follows.  In  the  conceptualization  of  "item  bias",  it  is  generally  assumed  that 
the  validity  of  some  items  of  the  test  could  be  questionable  while  the  rest  of  the  items  are 
considered  valid.  That  is,  these  items  of  questionable  validity  could  contribute  to  test  score 
differences  between  groups  of  examinees  with  equal  ability.  In  DIF  analyses,  however,  it  is 
conceptualized  that  some  items  could  contribute  to  test  score  differences  between  two 
groups  of  examinees  matched  according  to  some  criterion  about  which  no  validity  claim  is 
made.  For  example,  examinees  could  be  matched  upon  total  test  score  with  no 
accompanying  claim  of  validity  for  the  items  of  the  test.  Therefore,  in  item  bias  analyses, 
the  construct  validity  of  the  matching  subtest  needs  to  be  established  while  in  DIF  analyses 
it  is  not  needed.  In  this  sense  item  bias  is  a  special  case  of  DIF.  Several  biased  items  acting 


Amplification  and  Cancellation  of  DIF  3 


in  concert  produce  test  bias,  and  several  DIF  items  acting  in  concert  produce  DTF 
(differential  test  functioning).  Shealy  and  Stout  (1992b)  have  further  discussed  the 
differences  between  bias  and  DIF  analyses  in  a  more  detailed  manner. 

One  of  the  recently  developed  IRT  based  methodologies  for  detecting  item/test  bias 
or  DIF/DTF  has  been  developed  by  Shealy  and  Stout  (1992a, 1992b).  Known  as  SIBTEST 
(SIB  denotes  simultaneous  item  bias),  it  is  a  statistical  test  to  simultaneously  detect  bias 
present  in  one  or  more  items  of  a  test.  SIBTEST  is  an  outgrowth  of  the  multidimensional 
IRT  modeling  of  test  bias  as  presented  in  Shealy  and  Stout  (1992a),  and  it  is  the  first 
among  IRT  based  procedures  to  allow  the  simultaneous  testing  for  bias  present  in  more 
than  one  item.  The  phenomenon  of  simultaneous  item  bias  is  said  to  occur  when  several 
biased  items  acting  in  concert  affect  the  test  score  differentially  for  the  different  examinee 
subpopulations,  resulting  in  test  bias.  In  part,  because  of  its  multidimensional  modeling 
approach,  SIBTEST  has  several  distinct  features.  First,  single  item  bias  as  well  as 
simultaneous  item  bias  can  be  detected.  Second,  a  formal  distinction  can  be  made  between 
genuine  test  bias  and  impact,  which  is  due  to  ability  differences  between  groups  in  the 
ability  intended  to  be  measured  (Ackerman,  1991a,  Dorans,  1989).  Third,  the  underlying 
psychological  (cognitive)  mechanisms  that  produce  bias  can  be  explicitly  addressed  through 
consideration  of  the  target  ability  as  contrasted  with  nuisance  determinants.  The  target 
ability  6  is  the  ability  intended  to  be  measured  by  the  test,  the  nuisance  determinant(s)  rj 
is  an  ability  or  construct  not  intended  to  be  measured  by  the  test  but  influencing  the 
responses  to  one  or  more  items. 

One  of  the  major  advantages  of  considering  simultaneous  item  bias  is  that  it  is 
possible  to  study  item  bias  amplification  and  item  bias  cancellation.  Bias  amplification  is 
illustrated  by  the  following:  if  a  set  of  individual  items  is  each  biased  against  males,  then 
one  can  study  the  effect  of  the  bias  collectively  against  males  at  the  overall  test  score  level. 
Bias  cancellation  is  illustrated  by  the  following:  if  one  set  of  individual  items  is  each  biased 


Amplification  and  Cancellation  of  DIF  4 


against  males  and  another  set  of  items  is  each  biased  against  females,  then  it  is  possible 
that  at  the  overall  test  score  level  the  respective  biases  might  cancel  each  other  out.  In  any 
bias  study  one  should  investigate  both  of  these  possibilities.  The  phenomenon  of  item  bias 
cancellation  has  been  previously  studied  empirically  by  Drasgow  (1987),  Roznowski  (1987), 
and  Reith  and  Roznowski  (1991). 

Reith  and  Roznowski  (1991)  and  Roznowski  (1987)  have  studied  the  effect  of  biased 
items  on  the  predictive  validity  of  the  test.  They  concluded  that  inclusion  of  biased  items 
in  the  test  can  actually  contribute  to  increased  predictive  validity  when  the  sources  of  bias 
are  diverse  and  multiply  determined.  They  argue  that,  although  items  with  non— trait  (but 
trait— relevant)  variance  may  manifest  bias  at  the  item  level,  nonetheless,  several  such 
items  can  actually  improve  the  amount  of  variance  explained  by  the  trait  at  the  test  score 
level  (here  "trait"  refers  to  the  ability  of  interest).  This  is  because,  at  the  test  score  level, 
the  amount  of  non-trait  variance  diminishes  while  the  trait  variance  increases,  thus 
improving  the  predictive  validity.  Thus,  the  removal  of  biased  items  might  sometimes  be 
considered  to  be  detrimental  to  the  predictive  validity  of  the  test. 

Drasgow  (1987)  has  shown,  using  Lord’s  chi-square  item  bias  statistic,  that  several 
biased  items  of  ACT  mathematics  usage  and  English  usage  tests,  biased  in  different 
directions  (some  against  Whites,  some  against  Blacks,  some  against  Hispanics,  etc.),  had 
no  cumulative  bias  effect  on  the  expected  number-correct  score.  That  is,  there  were  no 
consistent  differences  in  the  test  scores  across  groups.  This  was  attributed  to  bias 
cancellation  across  groups.  Humphreys  (1970,  1986)  has  long  recommended  deliberate 
inclusion  of  diverse  non— trait  determinants  in  test  items  in  order  to  diminish  the  biasing 
influence  of  any  particular  non-trait  ability  at  the  test  score  level.  These  studies  clearly 
show  that  the  study  of  the  effect  of  amplification  or  cancellation  of  biased  or  DIF  items  at 
the  test  score  level  is  a  significant  problem.  Shealy  and  Stout  (1992a)  directly  address  these 
issues  by  modeling  bias  in  a  multidimensional  frame  work  and  considering  the  simultaneous 


Amplification  and  Cancellation  of  DIF  5 


influence  of  several  biased  items  at  once.  According  to  them,  the  presence  of 
multidimensionality  is  a  prerequisite  for  bias.  If  test  data  can  be  modeled  by  a 
unidimensional  or  an  essentially  unidimensional  (Stout,  1990)  model,  then  bias  cannot 
exist.  The  concept  of  bias  in  a  multidimensional  frame  work  has  also  been  emphasized  by 
Shepard  (1982),  Kok  (1988)  and  others.  As  noted  before,  the  SIBTEST  procedure  is  an 
outgrowth  of  the  multidimensional  modeling  of  bias. 

Shealy  and  Stout  (1992a,  1992b)  have  demonstrated  through  simulation  studies  the 
ability  of  SIBTEST  to  detect  unidirectional  bias;  that  is,  bias  against  the  same  group 
regardless  of  the  level  of  target  ability  9.  In  their  simulations,  they  used  two—  and 
three— parameter  logistic  models  with  varying  sample  sizes  and  differing  degrees  of  induced 
bias.  The  findings  showed  that  SIBTEST  displayed  good  adherence  to  the  nominal  level  of 
significance  in  cases  of  no  bias  and  good  power  in  cases  where  one  or  more  items  were 
biased,  even  when  the  amount  of  bias  was  fairly  small.  In  cases  of  single  item  bias  studies, 
the  performance  of  SIBTEST  was  compared  to  that  of  the  Mantel— Haenszel  statistic.  Both 
the  SIBTEST  and  the  Mantel— Haenszel  procedures  produced  consistent  results  with 
respect  to  the  direction  and  the  amount  of  estimated  bias. 

The  purpose  of  this  paper  is  to  define  the  concepts  of  DIF  amplification  and  DIF 
cancellation  and  to  investigate  the  power  of  SIBTEST  to  address  these  phenomena.  A 
series  of  real  data  and  simulation  data  are  used  for  this  purpose.  In  case  of  single  item 
analyses,  SIBTEST  results  are  compared  with  the  Mantel-Haenszel  results.  Also,  a  brief 
description  of  the  SIBTEST  procedure  is  provided. 

Description  of  SIBTEST  Procedure 

In  this  section,  for  ease  of  presentation,  we  will  assume  the  bias  viewpoint  rather 
than  the  DIF/DTF  viewpoint.  It  is  vital,  however,  to  realize  that  a  similar  presentation 


Amplification  and  Cancellation  of  DIF  6 


could  have  been  given  using  the  DIF/DTF  perspective.  As  discussed  before,  the 
interpretations  of  SIBTEST  results  have  either  a  test  bias  or  a  DTF  interpretation, 
depending  upon  the  level  of  user  assumptions  about  the  validity  of  the  matching  subtest 
items.  In  particular,  SIBTEST  can  be  used  as  a  DIF  procedure  if  desired. 

Two  groups  (or  subpopulations)  of  interest,  the  reference  group  ( R )  and  the  focal 
group  ( F ),  are  assumed  to  take  a  given  test.  The  complete  latent  space  9  underlying  the 
test  items  is  assumed  to  be  multidimensional:  {9  =  (0,a)},  where  0is  the  target  ability, 
intended  to  be  measured  by  the  test,  and  rj  is  the  nuisance  ability  vector  (possibly 
multidimensional),  not  intended  to  be  measured  by  test  items.  For  example,  in  an  English 
vocabulary  test,  it  is  possible  that  some  items  are  male  oriented,  such  as  those  requiring 
knowledge  of  sports,  and  some  other  items  are  female  oriented,  such  as  those  requiring 
knowledge  of  domestics.  In  a  situation  like  this,  English  vocabulary  skill  is  the  intended  to 
be  measured  ability  (0).  Knowledge  of  sports  (j^)  and  knowledge  of  domestics  (r?2)  are 
nuisance  abilities.  Let  U  denote  the  test  response  vector  and  h(U)  the  test  scoring  method. 
Number  correct  is  used  as  the  scoring  method  throughout  this  paper.  It  is  assumed  that  all 
items  of  the  given  test  measure  the  target  ability  0,  and  some  items  (biased  items)  measure 
both  target  ability  and  one  or  more  nuisance  abilities  g.  It  is  also  assumed  that  the  usual 
IRT  assumptions  of  local  independence,  monotonicity,  and  group  invariance  hold  with 
respect  to  9  and  that  this  collection  of  assumptions  do  not  hold  for  any  subset  of 
components  of  9. 

The  statistical  procedure  for  testing  the  null  hypothesis  of  no  test  bias  is  briefly 
explained  below,  for  details  see  Shealy  and  Stout  (1992b).  The  hypothesis  can  be  stated  as: 

0  vs.  H:p^>  0, 

where  /Jjyis  a  parameter  denoting  the  amount  of  unidirectional  test  bias  against  the  focal 


Amplification  and  Cancellation  of  DIF  7 


group.  Unidirectional  bias  occurs  if  the  probability  of  answering  an  item(s)  is  consistently 

higher  (lower)  for  one  group  compared  to  the  other,  over  all  levels  of  ability  9.  That  is, 

marginal  item  characteristic  curves*  for  the  two  groups  do  not  cross  as  9  varies  over  the 

ability  range.  Let  X  =  I/,  be  the  total  score  on  the  valid  subtest,  which  by  definition, 

consists  of  n  items  the  user  is  willing  to  assume  measure  the  target  ability.  Let  Y  = 

N 

E  .  ,U-  be  the  total  score  on  the  studied  subtest  which  consists  of  one  or  more  items 
n+i  i 

measuring  target  and  possibly  nuisance  abilities.  It  is  assumed  that,  for  long  tests, 
examinees  with  the  same  valid  subtest  score  are  of  approximately  equal  target  ability  9  and 
thus  are  comparable.  Following  this  logic,  examinees  within  reference  and  focal  groups  are 
subgrouped  according  to  their  total  score  on  the  valid  subtest.  Examinees  with  the  same 
valid  subtest  score  are  then  compared  across  reference  and  focal  groups  on  their 
performance  on  the  studied  subtest  item(s).  The  test  statistic,  which  is  a  sort  of 
standardization  index  (see  Dorans  &  Kulick,  1986),  for  testing  the  null  hypothesis  of  no 
bias  is  then  given  by 


B  = 


(1) 


K 

*  *  o 

where  /  P^i  F^—  F^.),  P^is  proportion  among  focal  group  examinees 

r 

attaining  X~k  on  the  valid  subtest.  F^and  Y ^  are  the  "adjusted"  means  of  the  studied 
subtest  for  examinees  with  a  valid  subtest  score  of  X=k  (fc=0,l,...,n)  in  the  reference  and 
focal  groups  respectively.  Because  the  procedure  must  work  for  short  as  well  as  long  tests, 
these  means  are  adjusted  for  differences  in  the  9  distributions  between  reference  and  focal 
groups  arising  from  short  test  lengths  (for  example,  25  items),  and  inherent  differences  in 
the  9  distributions  for  the  two  groups  (for  details,  see  regression  correction  in  Shealy  k 


Amplification  and  Cancellation  of  DIF  8 


Stout,  1992b).  cr(0y)  is  the  estimated  standard  error  of  0y  given  by 


°i$u)  = 


ir  *2 

1A 

[k=Q 


J^o*{Y\k,R)  + -±-o*{Y\k,F) 
Rk  JFk 


where  <r2(  Y\  k,g)  is  the  sample  variance  of  the  studied  subtest  for  examinees  in  group  g  ( R 
or  F)  with  a  total  score  of  k  on  the  valid  subtest;  and  J ^  and  Jp ^  are  the  sample  sizes  in 
the  reference  and  focal  groups  respectively  with  a  total  score  of  fcon  the  valid  subtest. 

The  null  hypothesis  of  no  bias  is  rejected  with  error  rate  a  if  the  value  of  B  exceeds 
the  upper  100(1— a)th  percentile  point  of  the  standard  normal  distribution.  0y  is  also  the 
statistic  used  to  estimate  the  amount  of  unidirectional  bias  0^,  For  example,  a  0y  value  of 
0.1  indicates  that  the  average  difference  in  the  expected  total  test  scores  between  reference 
and  focal  group  examinees  of  similar  ability  is  0.1.  If  this  is  the  result  of  a  single  studied 
item  with  the  reminder  of  the  items  assumed  valid,  then  /?^=  0.1  is  the  estimated 
difference  in  the  probability  of  getting  the  studied  item  correct  between  reference  and  focal 
group  examinees  of  similar  ability.  Positive  values  of  0-^  indicate  bias  against  the  focal 
group  and  negative  values  of  0 y  indicate  bias  against  the  reference  group.  Simulation 
studies  by  Shealy  and  Stout  (1992b)  showed  that  B  has  good  statistical  properties  such  as 
good  adherence  to  the  nominal  significance  level  and  high  power. 


Simulation  Study 
Details  about  Simulations 


In  order  to  investigate  amplification  and  cancellation  of  DIF  and  the  use  of 
SIBTEST  to  detect  such,  a  simulation  study  was  designed  to  model  realistic  situations. 
Item  parameters  (a  ,b-,c  )  of  valid  subtests  were  obtained  from  the  literature  and  the  item 

'  Z  Z  v 


Amplification  and  Cancellation  of  DIF  9 


parameters  of  studied  subtests  were  hand  selected  to  control  the  amount  of  DIF  present. 
The  estimated  item  parameters  from  the  SAT— Verbal  (Drasgow,  1987)  were  used  for  valid 
subtests.  The  parameters  of  the  studied  subtest  items  (that  is,  DIF  items)  are  listed  in 
Table  1.  Item  parameters  of  studied  subtests  were  selected  such  that  the  difficulty 
parameters  were  all  centered  around  zero,  with  varying  discrimination  parameters  for  0,  ^ 
and  ^2-  All  studied  sub  test  items,  except  the  last  three,  are  influenced  by  9  and  rjy  The 
last  three  items  are  influenced  by  9  and  77 ^  The  guessing  level  is  fixed  to  0.2  for  all  items. 
For  amplification  studies,  only  items  with  nuisance  ability  77^  were  used.  For  the 
amplification  and  cancellation  study,  both,  items  with  nuisance  ability  77p  and  items  with 
nuisance  ability  rj^  were  used. 

Amplification  Study 

The  target  and  the  nuisance  abilities  were  generated  from  a  bivariate  normal 
distribution  as  follows.  For  notational  simplicity  the  subscript  for  77^  is  dropped. 


where  p  is  the  correlation  between  9  and  77  for  group  g,  which  is  set  at  0.5  for  both  groups 
(different  values  of  p  across  groups  tends  to  produce  bidirectional  DIF).  As  can  be  seen  the 
variances  <j\9\  g)  and  0^(77!  g)  were  set  at  1.  The  means  pg  and  p^^  for  each  group  were 
determined  through  specification  of  other  parameters  as  follows. 

Target  ability  difference  between  the  reference  and  focal  groups  is  denoted  by 


d 


T~ 


^9R~^9F 
°9P  ' 


(3) 


where 


Amplification  and  Cancellation  of  DIF  10 


a6P  =  1  ffi>+  V7^0 I  ^aR=  “d  aF =  JJTJj 

and  and  denote  sample  sizes  in  reference  and  focal  groups  respectively,  a^p  is  the 
weighted  average  of  the  variances  of  reference  and  focal  groups  on  the  target  ability.  Since 
cr2(Q  |  R)  and  cr2(@|  F)  were  taken  as  1  in  simulation  studies  (see  Equation  2),  dp  = 
HQR-HeF  That  is,  drj,  is  a  measure  of  how  much  the  two  groups  differ  in  target  ability 
distributions  (same  as  impact) 

Another  criterion  for  choosing  ^qR  and  (igp  was  that  the  average  difficulty  level  (F) 
of  the  valid  subtest  items  was  assumed  equal  to  the  average  target  ability  pooled  across 
groups: 


F  —  £{0]  —  aR}lgji^apiiQp  (4) 

That  is,  on  average  the  difficulty  of  the  valid  subtest  items  is  assumed  to  be  well  matched 
with  the  pooled  average  target  ability  of  the  two  groups.  By  specifying  dp3.nd  F,  Equations 
3  and  4  together  determine  HgR  and  ^ p  Parameters  ^i^R  and  n^pweie  determined  as 
follows. 

Potential  for  DIF  Cp  is  defined  as  the  difference  between  the  conditional 
expectation  of  77  for  the  two  groups,  given  by 

CP=  E[VR\0]-E[VF\0] 

=  ^ r}R~ +  (/?  ~ 

Following  Equation  2  and  0 


C0~^t]R  jF^~pdT 


(5) 


Amplification  and  Cancellation  of  DIF  1 1 


Another  criterion  for  choosing  the  means  of  r\  is  that,  for  an  "average"  value  of 
target  ability  (0=0)  we  assume  the  conditional  nuisance  ability  to  be  centered  around  the 
chosen  target  ability  value  for  the  two  groups.  Namely, 


E[,fl|0=O]=-E[,F|e=O] 

That  is, 

<  (V 

Once  Hqr  and  /i^are  known,  by  specifying  Cp,  and  ^^,can  be  determined 
from  Equations  5  and  6. 

The  choice  of  values  for  Cp  in  the  simulations  were  guided  by  the  desired  amount  of 
the  estimated  DIF,  P^  In  other  words,  values  of  Cp  were  chosen  so  that  the  amount  of 
estimated  DIF  would  be  "small"  (0</?y<0.05),  "moderate"  (0.05</3^<0.1),  or  "large" 

{Py> 0.1).  From  the  practical  viewpoint,  the  standard  used  to  determine  what  is  meant  by 
small,  moderate,  or  large  DIF  was  based  on  observed  delta  values  of  the  Mantel— Haenszel 
statistic  Ajvfff  (Holland  &  Thayer,  1988).  An  approximate  empirical  relationship  between 
A^and  /J^is  given  by 

V-<W10  <7> 

Recall  that  P^is  a  measure  of  the  average  difference  in  expected  test  scores  between 
reference  and  focal  group  members  of  similar  ability.  That  is,  Py  as  estimated  by  P^c an 
be  useful  for  direct  interpretations  of  DIF  in  terms  of  differing  expectations  of  total  score 
for  the  two  groups. 

In  simulation  studies  presented  here  d^was  taken  as  zero.  That  is,  the  difference 
between  the  target  ability  means  in  the  two  groups  was  zero  .  For  simulation  studies  where 


Amplification  and  Cancellation  of  DIF  12 


drp^O,  see  Shealy  &  Stout  (1992b).  Two  values  of  Cp  were  considered:  0.5,  and  1.0.  Positive 
values  of  C ^  denote  DIF  against  the  focal  group  and  negative  values  of  denote  DIF 
against  the  reference  group.  Three  different  combinations  of  examinee  sizes  {JpJj^), 
typical  of  those  commonly  occurring  in  applications,  were  considered:  (7^500,  7^=500), 
(1000,  3000),  and  (1000,  1000).  Two  valid  subtest  lengths  ( N)  were  considered:  25  and  50 
items.  These  items  were  randomly  selected  from  80  estimated  three-parameter  logistic 
item  parameters.  Item  responses  for  the  valid  subtest  were  generated  by  using  the 
three-parameter  logistic  model: 


Pfl)  =  ct  + 


l+exp(-l. 7(^(0 -bj) 


(8) 


where  a ^  b ^  and  c  ■  are  the  discrimination,  difficulty  and  guessing  parameters  of  item  i. 
Item  responses  for  the  studied  subtest  were  generated  by  using  the  two-dimensional  three 
parameter  logistic  model  with  compensatory  abilities  (Reckase  &  McKinley,  1983): 


Pflrt)  =  ci 


1-c • 


+ 


l+exp(-1.7(aie(6-bi0)+airj(Ti-biri))) 


i=n+l,...,N 


(9) 


For  each  simulated  examinee  (see  Equation  2),  binary  item  responses  (0,1)  were 
obtained  as  follows.  The  probability  of  correctly  answering  valid  subtest  items  was 
computed  using  Equation  8.  If  a  simulated  uniform  random  value  on  the  interval  (0,1)  was 
less  than  or  equal  to  the  computed  P^0),  then  the  item  was  considered  answered  correctly 
and  a  score  of  1  was  assigned.  Otherwise  the  item  was  considered  incorrect  and  a  score  of  0 
was  assigned.  Similarly,  for  studied  items  was  computed  using  Equation  9  and  a 

score  value  of  0  or  1  was  assigned. 


Amplification  and  Cancellation  of  DIF  13 


Cancellation  Stud' 


Since  there  are  two  nuisance  abilities  tj ^  and  t/2  in  this  case,  these  are  generated  as 
follows.  The  9  and  tj^  have  a  bivariate  normal  distribution  given  by 


0  I*’ 

Vi\9. 


ffirj-  cm 


and  9  and  ^  have  a  bivariate  normal  distribution  given  by 


0  I*' 
*2 1 « 


■  -ay- cm 


where  p  is  the  correlation  between  9  and  tj^,  and  between  9  and  r/2,  which  is  taken  to  be  0.5 
for  both  groups.  Also,  ^  and  j/2  were  generated  independently  of  each  other,  for  each  fixed 
9.  As  in  the  case  of  amplification,  variances  a2(9\ g),  <r  (^|g)  and  \g)  were  all  taken 
to  be  1.  The  means  pg^  an<*  ^9F^  were  determined  by  Equations  3  and  4.  The  means 
r  and  p^  p)  and  p^  g  ( p ^  R  and  p ^  p)  were  determined  through  Equations  12 


and  13  as  follows: 


c0i=  E[viR\0]-E[viF\0) 


~  (tiTJ.R-HrJ.F)-PdT> 

i  'i 


i=l,2  (12) 


(\R-W9R>  =  ~<\r^9F^ 


i— 1)2  (13) 


Amplification  and  Cancellation  of  DIF  14 


where  C ^  is  the  potential  for  DIF  caused  by  the  nuisance  ability  and  is  chosen  just  as 
for  the  amplification  case.  Item  responses  were  generated  just  as  in  the  amplification  case 
using  Equations  8  and  9.  Here  Equation  9  applies  to  (0,^)  or  [9,r]^)  depending  upon  item 
number.  For  example,  items  1  through  11  of  Table  1  depend  upon  9  and  r\ p  and  items  12 
through  14  depend  upon  9  and  r/^ 

Results  of  Simulation  Study 


Three  different  simulation  studies  were  done,  each  with  varying  values  for 
Cp  and  N.  The  results  for  Amplification  Study  1  are  shown  in  Table  2.  This  study  has  500 
examinees  in  each  of  the  focal  and  reference  groups  with  50  items  in  the  valid  subtest.  The 
first  column  denotes  the  item  numbers  (taken  from  Table  1)  used  in  the  studied  subtest; 
the  second  column  denotes  the  degree  of  potential  for  DIF  induced  in  the  simulations  (C^); 

TT 

the  third  column  denotes  the  average  estimated  DIF  over  100  replications  (/fy);  the  fourth 
column  denotes  the  observed  (estimated)  standard  error  of  0-^  over  100  replications;  and 
the  fifth  column  denotes  the  rejection  rate  of  testing  the  null  hypothesis  of  no  DIF  over  100 
replications.  The  last  three  columns  report  the  estimated  mean,  standard  error,  and  the 
rejection  rate  of  DIF  using  the  Mantel-Haenszel  statistic  over  100  replications.  The  first 
row  of  Table  2,  for  example,  denotes  that  item  4,  from  Table  1,  was  used  in  the  studied 
subtest  with  .50  as  the  potential  for  DIF.  The  average  amount  of  estimated  DIF,  over  100 
replications,  was  .022  with  a  standard  error  of  .036.  The  null  hypothesis  of  no  DIF  was 
rejected  18  out  of  100  replications.  The  Mantel-Haenszel  analyses  indicate  that  for  this 


item,  the  estimated  mean  of  A^^was  —.342  with  an  observed  standard  error  of  .435.  The 


null  hypothesis  of  no  DIF  was  rejected  9  times  out  of  100  replications. 


As  can  be  seen  from  Table  2,  each  of  the  items  4,  5,  6,  7,  and  8  were  tested 
individually  for  DIF,  and  then  tested  collectively.  That  is,  in  each  case  the  valid  subtest 


Amplification  and  Cancellation  of  DIF  1 5 


consisted  of  50  items  and  the  studied  subtest  consisted  of  exactly  one  item  except  for  the 
last  row  where  the  studied  subtest  consisted  of  all  five  items.  It  can  be  seen  that  the 
average  amount  of  estimated  DIF  for  individual  items  ranged  from  .022  to  .035,  indicating 
small  DIF  (0</?^<.05)  at  the  item  level.  When  all  five  DIF  items  were  included  in  the 
studied  subtest,  however,  the  amount  of  estimated  DIF  was  amplified  to  .148,  indicating  a 
large  DIF  (/3^j>.l).  In  other  words,  when  all  DIF  items  act  in  concert,  the  difference  in  the 
expected  test  scores  between  the  groups  was  about  .15.  Thus,  from  column  three,  it  can  be 
seen  that  at  the  item  level  each  of  these  items  are  likely  to  be  missed  as  DIF  items  because 
of  their  low  value  of  estimated  DIF,  nonetheless,  at  the  test  level  the  amplification  is  such 
that  the  total  DIF  is  substantial.  Similarly  from  column  five  it  can  be  seen  that  the 
rejection  rate  for  individual  items  ranged  from  .17  to  .23  while  the  rejection  rate  for  all  five 
items  together  jumped  to  .7,  reflecting  the  cumulative  effect  of  DIF.  Comparison  of 
SIBTEST  results  with  those  of  Mantel-Haenszel  show  that  both  the  procedures  are 
consistent  in  their  assessment  of  direction  of  DIF,  the  amount  of  estimated  DIF,  and  the 
standard  error  of  estimate,  whenever  a  single  item  was  considered. 

Table  3  displays  the  results  of  Amplification  Study  2.  In  this  case  the  degree  of 
potential  for  DIF  was  increased  to  1.0  and  the  sample  sizes  for  reference  and  focal  groups 
were  increased  to  3000  and  1000  respectively.  Items  9,  10,  and  11  (from  Table  1)  were 
selected  for  this  study.  Similar  to  the  results  in  Table  2,  for  individual  DIF  items,  the 
amount  of  estimated  DIF  was  moderate  (.05</?jj<.1).  However,  when  all  three  DIF  items 
were  included  in  the  studied  subtest,  the  amount  of  estimated  DIF  was  amplified  to  .225, 
indicating  large  DIF.  That  is,  when  all  three  DIF  items  act  in  concert,  the  estimated 
difference  in  the  expected  test  score  between  the  groups  was  beyond  0.2.  Comparison  of 
results  of  SIBTEST  with  those  of  Mantel-Haenszel  again  showed  that  they  are  consistent 
and  comparable  whenever  a  single  item  was  considered  for  DIF. 

Table  4  displays  the  results  of  the  Amplification  and  Cancellation  Study.  Each  of 


Amplification  and  Cancellation  of  DIF  16 


the  reference  and  focal  groups  contains  1000  examinees  with  25  items  in  the  valid  subtest. 
Items  1,  2,  and  3,  which  depend  upon  9  and  were  used  here  with  0.5  as  the  potential  for 
DIF  against  the  focal  group  (C^  positive).  These  studied  items  were  tested  individually 
and  collectively  for  DIF  against  the  focal  group.  Items  12,  13,  and  14,  which  depend  upon  0 
and  7/2  were  used  with  -0.5  as  the  potential  for  DIF,  but  against  the  reference  group  (C^, 
negative).  These  items  were  also  studied  individually  and  collectively  for  DIF  against  the 
reference  group.  Finally,  all  six  items  were  used  collectively  with  their  corresponding 
positive  and  negative  DIFs  tc  study  DIF  cancellation.  As  can  be  seen  from  Table  4,  items 

nr 

1,  2,  and  3  together  exhibit  large  positive  DIF  against  the  focal  group  (/?^=.  188);  while 

items  12,  13,  and  14  exhibit  large  negative  DIF  against  the  reference  group  (/?^=-.185); 
However,  when  items  1,  2,  3,  12,  13,  and  14,  were  combined  together  in  the  studied  subtest, 

IT 

the  DIF  canceled  out  at  the  test  score  level  (0^ y=— .002).  Thus,  this  test,  in  spite  of  having 
six  DIF  items,  displays  virtually  no  DIF  at  the  test  level.  Note  that  SIBTEST  was  used 
both  to  detect  the  amplification  of  positive  DIF  for  items  1,  2,  and  3  and  the  amplification 
of  negative  DIF  for  items  12,  13,  and  14,  as  well  as  the  cancellation  resulting  from  the 
combined  influence  of  all  six  studied  items. 

In  summary,  the  simulation  studies  have  demonstrated  the  effectiveness  of 
SIBTEST  in  detecting  DIF  amplification  and  DIF  cancellation.  This  was  established  for 
different  sample  sizes  and  test  lengths.  Comparison  of  SIBTEST  results  with  those  of 
Mantel— Haenszel,  at  the  item  level,  show  that  both  are  performing  about  equally  well. 

Real  Data  Study 

Description  of  the  Data 

Three  real  data  sets  were  used  to  investigate  the  effectiveness  of  SIBTEST  to  detect 


Amplification  and  Cancellation  of  DIF  17 


amplification  and  cancellation  of  DIF  in  a  real  application.  The  data  sets  considered  were: 
the  American  College  Testing  program  (ACT)  mathematics  test  data,  Form  39B,  for  males 
and  females;  The  National  Assessment  of  Educational  Progress  (NAEP),  1986  history  test 
data  for  males  and  females,  and  for  Blacks  and  Whites  (NAEP,  1988).  The  mathematics 
data  consists  of  60  items  with  2115  males  and  2885  females.  The  history  data  consists  of  36 
items  with  1225  males,  1215  females,  1711  Whites,  and  447  Blacks.  The  analyses  were 
carried  out  in  the  following  manner. 

For  each  of  the  data  sets,  DIF/DTF  analyses  were  performed.  That  is,  each  item 
was  analyzed  for  DIF  with  the  rest  of  the  items  forming  the  "valid  subtest".  In  the  first 
stage  of  item  level  analyses,  both  SIBTEST  and  Mantel-Haenszel  statistics  were  computed 
and  compared  for  each  item.  In  the  second  stage  of  test  level  analyses,  items  that  exhibited 
moderate  to  large  DIF  according  to  both  procedures  were  analyzed  together  to  investigate 
DIF  amplification  and  cancellation.  For  these  analyses,  each  studied  subtest  consisted  of  a 
collection  of  items  of  one  of  three  types:  items  favoring  the  focal  group,  or  items  favoring 
the  reference  group,  or  item  favoring  both  groups  (that  is,  some  items  favoring  the 
reference  group  and  other  items  favoring  the  focal  group).  Thus  an  attempt  was  made  to 
study  both  amplification  and  cancellation,  from  the  DTF  perspective. 

Results  of  Real  Data  Study 

The  results  of  the  analyses  of  mathematics  data  for  males  and  females  are  shown  in 
Tables  5  and  6.  Table  5  shows  the  results  of  individual  item  analyses  (that  is  DIF 
analyses).  The  items  listed  were  identified  as  exhibiting  DIF  by  both  the  procedures,  the 
SIBTEST  and  the  Mantel-Haenszel^.  The  first  half  of  Table  5  shows  items  exhibiting 
moderate  (.05</3^<.l)  to  large  (^>.1)  amount  of  DIF  favoring  males.  That  is,  these  items 
are  showing  DIF  against  females.  The  second  half  of  Table  5  shows  items  exhibiting 


Amplification  and  Cancellation  of  DIF  18 


moderate  to  large  amount  of  DIF  against  males. 

Table  6  shows  DIF  amplification  and  cancellation  effects  for  items  shown  in  Table  5. 
Table  6  shows  items  used  in  the  studied  subtest;  whether  studied  items  favor  males  or 
females;  the  amount  of  estimated  DIF  (/3^);  the  value  of  the  Shealy— Stout  statistic  ( B  of 
Equation  1)  and  the  associated  p— value.  The  first  row  of  Table  6  shows  DIF  cancellation 
effect  of  items  17  and  19  together.  Item  17  favors  males  with  large  DIF  while  item  19 
favors  females  with  large  DIF,  each  at  the  item  level.  When  these  items  were  combined 
together,  however,  the  DIF  canceled  out  completely  at  the  test  level  (0y=—  0006).  That  is, 
although  each  of  the  items  is  favoring  a  different  group  at  the  item  level,  together  at  the 
test  level  the  DIF  canceled  out  resulting  in  no  difference  in  the  expected  test  scores  of  the 
two  groups.  The  second  row  of  Table  6  shows  DIF  amplification  of  items  showing  moderate 
DIF,  each  against  females  at  the  item  level.  The  third  row  shows  DIF  amplification  of 
items  showing  moderate  DIF,  each  against  males  at  the  item  level.  The  last  row  shows  DIF 
amplification  and  cancellation  when  all  items  favoring  males  (with  moderate  and  large 
DIF)  and  all  items  favoring  females  are  analyzed  together.  Because  DIF  amplification  for 
items  favoring  only  males  is  higher  in  magnitude  than  DIF  amplification  for  items  favoring 
only  females,  when  all  DIF  items  were  combined,  positive  and  negative  DIF  is  not  totally 
canceled  out.  That  is,  there  is  some  overall  DTF  for  these  items  against  females  {0y= 

.294). 

Tables  7  and  8  show  the  results  of  the  analyses  of  the  history  test  for  males  and 
females.  Analogous  to  Table  5,  Table  7  shows  items  exhibiting  moderate  to  large  amounts 
of  DIF,  by  both  procedures,  for  both  groups.  Table  8  shows  the  results  of  DIF  amplification 
and  cancellation  effects.  In  Table  7  there  is  only  one  item  with  large  DIF  favoring  males. 
The  rest  of  items  exhibit  moderate  DIF.  Therefore,  Table  8  shows  DIF  amplification 
results  for  items  favoring  males  only;  amplification  results  for  items  favoring  females  only; 
and  amplification  and  cancellation  results  for  all  DIF  items.  As  can  be  seen  from  the  last 


Amplification  and  Cancellation  of  DIF  19 


row  of  Table  8,  there  is  almost  total  cancellation  of  DIF  (/?^=.018)  when  all  DIF  items 
were  assessed  together.  Thus,  there  is  no  DTF  present  in  this  case. 

Tables  9  and  10  show  the  results  of  the  analyses  of  the  history  test  for  Whites  and 
Blacks.  Analogous  to  the  above  two  cases,  Table  9  shows  DIF  results  at  the  item  level  and 
Table  10  shows  DTF  results  at  the  test  level.  It  can  be  seen  from  Table  9  that  very  few 
items  favor  Blacks  relative  to  the  number  of  items  that  favor  Whites.  Therefore  Table  10 
only  contains  amplification  results  for  items  favoring  Whites  only  and  amplification  and 
cancellation  results  for  all  the  DIF  items  from  Table  9.  As  expected,  in  this  case,  the 
magnitude  of  DIF  amplification  against  Blacks  is  large,  and  when  all  DIF  items  were 
combined  together  there  is  only  moderate  DIF  cancellation  with  overall  DTF  remaining 
against  Blacks. 

In  summary,  findings  of  real  data  studies  have  replicated  findings  from  simulated 
studies  in  the  sense  that  both  amplification  and  cancellation  were  established.  The  results 
of  SIBTEST  analyses  at  the  item  level  were  almost  totally  consistent  with  those  of  the 
Mantel— Haenszel  both  in  the  direction  and  the  amount  of  estimated  DIF.  The 
amplification  and  cancellation  results  using  SIBTEST  with  real  data  have  demonstrated 
the  capability  of  SIBTEST  to  address  these  issues  in  real  settings.  It  should  be  emphasized 
that  the  real  data  studies  were  DIF/DTF  and  not  bias  studies.  These  results  are 
encouraging  for  future  applications  of  SIBTEST  for  studying  the  cumulative  effects  of  DIF 
at  the  test  score  level. 

For  all  three  sets  of  real  data,  content  analyses  of  DIF  items  were  performed  in  an 
attempt  to  identify  the  possible  correlates  to  the  occurrence  of  DIF  and  DTF.  Upon 
studying  the  mathematics  items  shown  in  Tables  5  and  6,  it  was  found  that  items  that 
favored  males  and  displayed  amplification  required  analytical/geometry  knowledge,  such 
as,  properties  of  triangles  and  trapezoids,  angles  in  a  circle,  volume  of  a  box,  etc.;  whereas 
items  that  favored  females  and  displayed  amplification  required  computational  knowledge 


Amplification  and  Cancellation  of  DIF  20 


such  as  factorization,  solving  equations,  etc.  Based  on  these  informal  content  analyses  of 
the  two  sets  of  items  displaying  amplification,  one  could  cautiously  conjecture  that  math 
education  of  males  may  tend  to  develop  understanding  of  analytical  concepts  while  math 
education  of  females  may  tend  to  develop  computational  skills.  Similar  conclusions  were 
drawn  by  Drasgow  (1987)  about  the  content  of  biased  items  of  a  different  version  of  the 
ACT  mathematics  test. 

Similarly,  the  analyses  of  the  history  items  for  the  male,  female  comparison  revealed 
that  items  favoring  males  involved  factual  knowledge,  such  as  location  of  different 
countries  on  the  world  map,  dates  of  certain  historical  events,  etc.,  whereas,  items  favoring 
females  involved  reasoning  ability  about  the  constitution,  entrance  to  the  League  of 
Nations,  etc. 

Content  analyses  of  history  items  for  Blacks  and  Whites  again  revealed  factual 
knowledge  items  favoring  Whites.  That  is,  these  items  required  knowledge  of  the  location 
of  different  countries  on  the  world  map,  facts  about  World  War  II,  etc.  There  were  only 
three  items  that  favored  Blacks  and  a  common  secondary  trait  in  these  three  items  was  not 
evident.  It  was  also  interesting  to  note  that,  across  the  three  data  sets,  the  difficulty  level 
of  items  that  exhibited  DIF  did  not  differ  significantly  from  the  difficulty  level  of  the  rest 
of  the  items  in  the  respective  tests.  In  other  words  DIF  was  not  related  to  difficulty  level  of 
items. 


Summary  and  Discussion 

This  paper  has  investigated  DIF  amplification  and  cancellation  at  the  test  score 
level  and  SIBTEST’s  ability  to  detect  and  estimate  each.  Based  on  simulation  as  well  as 
real  data  analyses,  SIBTEST  demonstrated  its  effectiveness  to  assess  DIF  at  the  item  level 
as  well  as  at  the  test  score  level.  As  demonstrated,  at  the  test  score  level  the  cumulative 


Amplification  and  Cancellation  of  DIF  21 


effect  of  DIF  could  either  amplify  or  cancel  out  partially  or  completely.  In  addition,  at  the 
item  level  of  analysis,  comparison  of  SIBTEST  with  Mantel-Haenszel  showed  mutual 
consistency. 

If  one  wants  to  detect  bias  rather  than  merely  detect  DIF  or  DTF,  one  of  the 
requirements  of  SIBTEST  is  that  it  requires  a  valid  subtest,  which  serves  as  an  internally 
valid  benchmark  to  assess  bias  against.  On  the  face  of  it,  this  requirement  may  sound 
unrealistic.  However,  attempts  by  Ackerman  (1991a,  1991b)  and  others  seem  promising  in 
obtaining  an  empirically  validated  valid  subtest  that  could  greatly  assist  in  bias  analyses. 
As  an  alternative  to  using  the  "valid"  subtest  to  match  examinees,  one  could  also  use  an 
external  criterion  of  the  intended  to  be  measured  ability  in  concert  with  or  instead  of  the 
valid  subtest. 

Study  of  DIF  at  the  item  level  as  well  as  at  the  test  level  can  be  very  useful  for  test 
construction  purposes.  It  is  well  known  that  item  responses  are  multiply  determined  in  the 
sense  that  multiple  traits  determine  an  examinee’s  response  to  each  item.  The  decision  to 
remove/add  items  should  not  be  based  at  the  item  level  analyses  alone  but  should  consider 
the  effect  of  such  items  at  the  test  level.  It  is  possible  one  could  add/remove  items  in  order 
to  balance  the  influence  of  one  or  more  of  secondary  traits.  Moreover,  since  decisions  about 
individuals  are  made  at  the  test  score  level,  it  is  important  to  simultaneously  assess  the 
cumulative  effect  of  several  DIF  items  affecting  different  subpopulations  at  the  test  score 
level.  As  emphasized  by  other  researchers  (Drasgow,  1987;  Humphreys,  1986;  Roznowski, 
19897;  Reith  &  Roznowski,  1991),  inclusion  of  items  with  multiple  determinants  could 
significantly  improve  the  predictive  as  well  as  the  construct  validity  of  a  test.  Based  on  the 
analyses  presented  herein,  SIBTEST  could  greatly  aid  in  this  process. 

Although  a  statistical  hypothesis  testing  procedure  can  be  useful  in  the  detection  of 
test  bias  or  DTF,  it  is  important  to  distinguish  between  statistically  significant  DTF  from 
a  practically  significant  amount  of  DTF.  This  is  because  with  any  statistical  procedure,  it 


Amplification  and  Cancellation  of  DIF  22 


is  well  known  that  with  large  sample  sizes  small  differences  in  group  performance  can  result 
in  a  statistically  significant  result.  For  example,  Drasgow  (1987)  has  shown,  through  Lord’s 
chi-square’s  method,  that  a  large  significant  chi-square  statistic  may  only  reflect  moderate 
bias  at  the  test  score  level,  even  when  one  third  of  the  items  are  biased.  In  the  present 
study,  for  example,  it  would  be  useful  to  know  the  practical  significance  of  observing  a  (3 ^ 
value  of  .1,  .5,  1.0  etc.  at  the  test  score  level.  The  estimated  index  of  DIF,  /3^,  should  be 
useful  in  assessing  whether  the  amount  of  DIF  present  is  of  practical  importance. 

SIBTEST  although  derived  using  IRT,  uses  simple  means  and  variances  of  scores  on 
valid  and  studied  subtests  to  obtain  test  statistics.  It  is  computationally  simple  and  does 
not  involve  IRT  parameter  estimation,  thereby  avoiding  estimation  problems.  Simulation 
and  real  data  studies  of  this  paper  have  demonstrated  SIBTEST’s  potential  for  assessing 
amplification  and  cancellation  of  DIF  in  a  variety  of  situations.  Nonetheless,  more  studies 
with  varied  sample  sizes,  test  sizes,  and  in  diverse  contexts  would  be  useful  to  further 
establish  its  empirical  utility.  Menu  driven  code  and  a  user’s  manual  are  available  on 
request  for  interested  users. 


Amplification  and  Cancellation  of  DIF  23 


Notes 

*If  P(i)  denotes  the  item  characteristic  curve  then  the  marginal  P(0)  is  gotten  by 
integrating  out  2  from  P(0,2)  using  the  conditional  density  f(2|0).  P(0)  is  interpreted  as  the 
probability  of  a  randomly  chosen  examinee  with  target  ability  6  getting  the  item  right. 

o 

For  some  applications,  it  can  make  more  sense  to  use  reference  group  examinees  or  the 
entire  group  of  examinees. 

3 

Generally  one  finds  nonzero  differences  in  group  means  on  the  target  ability  (that  is, 
chp^O).  However,  there  are  many  realistic  situations  where  no  differences  in  group  means 
exist.  In  the  present  study  d^  was  taken  as  zero  mainly  to  keep  the  design  simple.  The 
effectiveness  of  SIBTEST  to  detect  DIF  for  varying  cLj,  values  has  been  demonstrated  by 
Shealy  and  Stout  (1992b)  and  by  Roussos  (1992).  In  these  studies  drj.  was  used  as  a  factor 
in  the  experimental  design. 

^  Across  the  three  data  sets  (total  132  items),  there  were  seven  items  where  there  was 
inconsistency  between  the  SIBTEST  and  the  Mantel-Haenszel  analyses.  Three  items 
exhibited  DIF  through  SIBTEST  only  and  four  items  exhibited  DIF  through 
Mantel-Haenszel  only.  These  items  were  not  included  in  the  studied  subtest. 


Amplification  and  Cancellation  of  DIF  24 


REFERENCES 


Ackerman,  T.  A.  (1991a).  A  didactic  explanation  of  item  bias,  item  impact,  and  item 
validity  from  a  multidimensional  perspective.  Paper  presented  at  the  annual 
meeting  of  the  American  Educational  Research  Association,  Chicago. 

Ackerman,  T.  A.  (1991b).  Measurement  direction  in  a  multidimensional  latent  space  and 
the  role  it  plays  in  bias  detection.  Paper  presented  at  the  1991  International 
Symposium  on  Modern  Theories  in  Measurement:  Problems  and  Issues. 

Montebello,  Quebec,  Canada. 

Angoff,  W.  H.  (1982V  Use  of  difficulty  and  discrimination  indices  for  detecting  item  bias. 
In  R.  A  Berk  (Ed.),  Handbook  of  methods  for  detecting  test  bias  (pp.  96—116). 
Baltimore,  MD:  Johns  Hopkins  University  Press. 

Cleary,  T.  A.,  &  Hilton,  T.  L.  (1968).  An  investigation  of  item  bias.  Educational  and 
Psychological  Measurement.  28,  61-75. 

Dorans,  N.  J.  (1989).  Two  new  approaches  to  assessing  differential  item  functioning: 
Standardization  and  the  Mantel — Haenszel  method.  Applied  Measurement  in 
Education.  2,  217-233. 

Dorans,  N.  J.,  &  Kulick,  E.  (1983).  Assessing  unexpected  differential  item  performance  of 
female  candidates  on  SAT  and  TSWE  forms  administered  in  December  1977:  An 
application  of  the  standardization  approach  (RR-83-9).  Princeton,  NJ: 
Educational  Testing  Service. 

Dorans,  N.  J.,  &  Kulick,  E.  (1986).  Demonstrating  the  utility  of  the  standardization 

approach  to  assessing  unexpected  differential  item  performance  on  the  Scholastic 
Aptitude  Test.  Journal  of  Educational  Measurement.  23,  355-368. 

Drasgow,  F.  (1987).  Study  of  the  measurement  bias  of  two  standardized  psychological 
tests.  Journal  of  Applied  Psychology.  72,  19—29. 

Hambleton,  R.  K.  &  Rogers,  H.  J.  (1989).  Detecting  potentially  biased  items:  Comparison 
of  IRT  area  and  Mantel— Haenszel  methods.  Applied  Measurement  in  Education. 
2(4),  313-334. 

Holland,  P.  W.,  &  Thayer,  D.  T.  (1988).  Differential  item  performance  and  the 

Mantel— Haenszel  procedure.  In  H.  Wainer  and  H.  I.  Braun  (Eds.),  Test  Validity 
(pp.  129—145).  Hillsdale,  NJ:  Lawrence  Erlbaum  Associates.  8,  173—181. 

Humphreys.  L.  G.  (1970).  A  skeptical  look  at  the  factor  pure  test.  In  C.  E.  Lunneborg 

(Ed.),  Current  problems  and  techniques  in  multivariate  psychology.  Proceedings  of 
a  conference  honoring  Professor  Paul  Horst  (pp.  23—32).  Seattle:  University  of 
Washington. 

Humphreys,  L.  G.  (1986).  An  analysis  and  evaluation  of  test  and  item  bias  in  the 
prediction  context.  Journal  of  Applied  Psychology.  71,  327—333. 


Amplification  and  Cancellation  of  DIF  25 


Hunter,  J.  F.  (1975).  A  critical  analysis  of  the  use  of  item  means  and  item— test 

correlations  to  determine  the  presence  or  absence  of  content  bias  in  achievement  test 
items.  A  paper  presented  at  the  National  Institute  of  Education  Conference  on  Test 
Bias.  Annapolis,  MD. 

Ironson  G.  H.  (1983).  Using  item  response  theory  to  measure  bias.  In  R.  K.  Hambleton 
(Ed.),  Applications  of  item  response  theory  (pp.  155—174).  Vancouver,  BC: 
Educational  Research  Institute  of  British  Columbia. 

Kok,  F.  (1988).  Item  bias  and  test  multidimensionality.  In  R.  Langeheine  and  J.  Rost 
(Eds.),  Latent  trait  and  latent  class  models,  (pp.  263—274).  New  York:  Plenum 
Press. 

Lord,  F.  M.  (1980).  Applications  of  item  response  theory  to  practical  testing  problems. 
Hillsdale,  NJ:  Erlbaum. 

NAEP  (1988).  National  Assessment  of  Educational  Progress  1985-86  public— use  data 
tapes.  Version  2.0.  Users  Guide.  Educational  Testing  Service. 

Petersen,  N.  S.,  &  Novick,  M.  R.  (19761.  An  evaluation  of  some  models  for  culture— fair 
selection.  Journal  of  Educational  Measurement.  13,  3-29. 

Raju,  N.  S.  (1988).  The  area  between  two  item  characteristic  curves.  Psvchometrika.  53, 
495-502. 

Reith,  J.  &  Roznowski,  M.  (1991).  Predictive  relations  of  tests  containing  differentially 
functioning  items:  Do  biased  items  result  in  biased  tests?  Paper  presented  at  the 
annual  meeting  of  the  American  Educational  Research  Association,  Chicago. 

Reckase,  M.  D.,  &  McKinley,  R.  L.  (1983).  Some  latent  trait  theory  on  a 

multidimensional  latent  space.  In  D.  J.  Weiss  (Ed.),  Proceedings  of  the  1979 
Computerized  Adaptive  Testing  Conference.  Minneapolis:  University  of 
Minnesota,  Department  of  Psychology. 

Reynolds,  C.  R.  (1982).  Methods  for  detecting  construct  and  predictive  bias.  In  R.  A. 

Berk  (Ed.),  Handbook  of  methods  for  detecting  test  bias  (pp.  199—227).  Baltimore, 
MD:  Johns  Hopkins  University  Press. 

Roussos,  L.  (1992).  Effects  of  small  sample  size  and  studied— item  parameters  on  SIBTEST 
and  Mantel-Haenszel  type-I  error  rates.  Unpublished  manuscript.  University  of 
Illinois,  Champaign,  Illinois. 

Roznowski,  M.  (1987).  Use  of  tests  manifesting  sex  differences  as  measures  of  intelligence: 
Implications  for  measurement  bias.  Journal  of  Applied  Psychology.  72,  480-483. 

Scheuneman,  J.  (1979).  A  method  of  assessing  bias  in  test  items.  Journal  of  Educational 
Measurement.  16,  143-152. 

Shealy,  R.  &  Stout,  W.  F.  (1992a).  An  item  response  theory  model  for  test  bias.  In  P.  W 
Holland  k  H.  Wainer  (Eds.),  Differential  item  functioning.  Hillsdale,  NJ:  Lawrence 
Erlbaum  Associates. 


Amplification  and  Cancellation  of  DIF  26 


Shealy,  R.  &  Stout,  W.  F.  (1992b).  A  model-based  standardization  approach  that 

separates  true  bias/ DIF  from  group  differences  and  detects  test  bias/DTF  as  well  as 
item  bias/DIF.  Psvchometrika. 

Shepard,  L.  A.  (1982).  Definitions  of  bias.  In  R.  A.  Berk  (Ed.).  Handbook  of  methods  for 
detecting  test  Dias  (pp.  9—30).  Baltimore,  MD:  Johns  Hopkins  University  Press. 

Shepard,  L.  A.,  Camilli,  G.,  &  Averill,  M.  (1981)  Comparisons  of  procedures  for  detecting 
test— item  bias  with  both  internal  ana  external  ability  criteria.  Journal  of 
Educational  Statistics.  6,  317-375. 

Swaminathan,  H.  &  Rogers,  H.  J.  (1990).  Detecting  differential  item  functioning  using 
logistic  regression  procedures.  Journal  of  Educational  Measurement.  27(4), 

361-370. 

Stout,  W.  F.  (1990).  A  new  item  response  theory  modeling  approach  with  applications  to 
unidimensional  assessment  and  ability  estimation.  Psvchometrika.  55,  293-326. 

Wainer,  H.,  Sireci,  S.  G.,  &  Thissen,  D.  (1991).  Differential  testlet  functioning:  Definitions 
and  detection.  Journal  of  Educational  Measurement.  28,  197—219. 


Table  1 


Item  Parameters  of  Studied  Subtests 
for  Simulation  Studies 


Item 

\ 

\ 

s 

ci 

1 

1.0 

0 

0.80 

0 

0 

0 

.2 

2 

1.5 

0 

0.75 

0 

0 

0 

.2 

3 

2.0 

0 

1.00 

0 

0 

0 

.2 

4 

0.8 

-.3 

0.20 

0 

0 

0 

.2 

5 

1.0 

0 

0.25 

0 

0 

0 

.2 

6 

1.5 

.3 

0.35 

0 

0 

0 

.2 

7 

2.0 

0 

0.40 

0 

0 

0 

.2 

8 

1.2 

0 

0.30 

0 

0 

0 

.2 

9 

0.8 

-.3 

0.30 

0 

0 

0 

.2 

10 

1.0 

0 

0.40 

0 

0 

0 

.2 

11 

1.5 

•3 

0.50 

0 

0 

0 

.2 

12 

1.0 

0 

0 

0 

0.80 

0 

.2 

13 

1.5 

0 

0 

0 

0.75 

0 

.2 

14 

2.0 

0 

0 

0 

1.00 

0 

.2 

Table  2 


Amplification  Study  1 
Jjr 500,  JR= 500,  N  =  50,  df=  0,  a=.05 


SIB  TEST 

Mantel-Haenszel 

Item 

ir 

seu y 

Rejection 

rate 

IT 

a 

MM 

Rejection 

rate 

4 

.50 

.022 

.036 

.18 

-.342 

.435 

.09 

5 

.50 

.031 

.031 

.17 

-.416 

.398 

.15 

6 

.50 

.035 

.035 

.23 

-.489 

.423 

.22 

7 

.50 

.030 

.039 

.18 

-.444 

.450 

.12 

8 

.50 

.028 

.039 

.22 

-.424 

.445 

.19 

4,5,6, 

7,8 

.50 

.148 

.067 

.70 

- 

- 

- 

J1 

;f=1000, 

Table  S 

Amplification  Study  2 
JR= 3000,  N  =  50,  djP 

0,  oc=.05 

SIB 

Mantel-Haenszel 

Item 

SE(0v) 

Rejection 

rate 

a 

MH 

5£<V 

Rejection 

rate 

9 

1.0 

.062 

.015 

.99 

-.996 

.223 

1.00 

10 

1.0 

.087 

.019 

1.00 

-1.140 

.272 

1.00 

11 

1.0 

.096 

.019 

1.00 

-1.248 

.256 

1.00 

9,10,11 

1.0 

.225 

.028 

1.00 

— 

— 

— 

Table  4 


Amplification  and  Cancellation  Study 
JpP 1000,  J^IOOO,  N  =  25,  dj? 0,  a=.05 


Item 

C01 

C02 

Rejection 

rate 

1 

0.5 

_ 

.071 

.021 

.98 

2 

0.5 

— 

.060 

.023 

.90 

3 

0.5 

— 

.065 

.021 

.96 

1,2,3 

0.5 

- 

.188 

.040 

1.00 

12 

-0.5 

-.074 

.021 

1.00 

13 

— 

-0.5 

-.058 

.022 

.82 

14 

-0.5 

-.062 

.021 

.98 

12,13,14 

- 

-0.5 

i 

• 

h-k 

00 

cn 

.036 

1.00 

1,2,3 

0.5 

-0.5 

-.002 

.061 

.02 

12,13,14 


Table  5 


Results  of  Mathematics  Test  Males  vs  Females 
Item  Level  DIF  Analyses:  SIBTEST  &  Mantel-Haenszel 1 


Items  favoring  males 

Items  favoring  females 

.05  <  \  <  .1  .1 

.1  <  -i  <  .05 

A  <-  -1 

23,  32,  34,  38  17 

4,  5,  9,  14,  29 

19 

48,  52,  58 

1  These  items  were  identified  as  exhibiting  DIF  by  both  the  SIBTEST 
and  the  Mantel-Haenszel 


Table  6 

Results  of  Mathematics  Test  Males  vs  Females 
DTF  Amplification  and  Cancellation:  SIBTEST 


items  of  the 
studied  subtest 

favors 

males 

favors 

females 

B 

P 

17  k  19 

— 

-.0006 

-.06 

.524 

23,  32,  34,  38,  48 

52,  58 

yes 

0,523 

12.85 

.000 

4,  5,  9,  14,  29 

- 

yes  -.340  - 

-10.15 

.000 

22,  32,  34,  38,  48 

52,  58,  17,  4,  5,  9 

14,  29,  19 

yes 

0.294 

4.68 

.000 

Table  7 


Results  of  History  Test  Males  vs  Females 
Item  Level  DIF  Analyses:  SIB  TEST  k  Mantel-Haenszel 


Items  favoring  males 

Items  favoring  females 

.05  <  <  .1  /?„  >  .1 

.1  <  -/?D  <  .05  -pv  <  .1 

12,  15,  25,  30  1 

9,  11,  22,  24,  34 

Table  8 

Results  of  History  Test  Males  vs  Females 

DIF  Amplification  and  Cancellation:  SIB 

items  of  the 

A 

favors  favors  0^  B 

P 

studied  subtest 

males  females 

12,  15,  25,  30,  1  yes  -  0.437  9.02  .000 

9,  11,  22,  24,  34  -  yes  -.381  -7.87  .000 

12,  15,  25,  30,  1,  9, 

11,  22,  24,  34 


0.018  0.24 


.405 


Table  9 


Results  of  History  Test :  Whites  vs  Blacks 
Item  Level  DIF  Analyses :  SIB  TEST  k  Mantel-Haenszel 


Items  favoring  Whites 

Items  favoring  Blacks 

.05 

<-K<  ■ 1  K >- A 

.1  <  -/Jy  <  .05 

~K  <-  -i 

7, 

11,  12,  16,  13,  14,  15 

3,  4,  5 

_ 

35 

17,  32,  36 

Table  10 

Results  of  History  Test  Whites  vs  Blacks 
Item  Level  DIF  Analyses :  SIB 


items  of  the 
studied  subtest 

favors 

Whites 

favors 

Blacks 

K 

B 

P 

all  items  favoring  Whites 
only 

yes 

- 

1.310 

9.96 

.000 

all  items  favoring  Whites 
and  Blacks 

yes 

- 

1.150 

7.43 

.000 

STOIT.TCL  tf  JAN  *2 

F kOM  aU,_aREA*  MSLRMNT 


DUiHb«U«*  LJU 


L>  Terry  Ackerman 
Educational  P»ycbolo0 
>oC  Education  Bldg. 

University  of  HUr>c*s 
Oumpa>(n,  IL  61801 

Dr  Terry  Allard 
Code  ««XS 
Office  of  Naval  Research 
NO  N  Quirvy  St 
Ari.ngton,  VA  22217-5000 

Dr.  Nancy  Alien 
Cducjt tonal  Testing  Service 
Princeton.  NJ  06M1 

Dr  Gregory  Anng 
Educational  Testing  Service 
Princeton.  NJ  085  41 

Dr  Ph-ppa  Arab* 

Graduate  School  of  Management 
Rutgers  Unnervry 
v;  Vc*»  Street 
NV»3rfc.  NJ  0?102189J 

Dr  Isaac  I  Be>ar 

Luw  S^tiool  A  J  nns*»on» 

Services 
BO*  40 

Newtown.  PA  1*040-0010 

Dr  William  O.  Berry 
Director  of  Life  and 
Or-  iron  menial  Sciences 
AfOSR.NL  SI  Bldg.  410 
Boll.ng  AFB.  DC  203324** 

Dr  Thomas  G  Bever 
Department  of  Piycbology 
University  of  Rochester 
R~cr  Station 
Rochester  NY  14627 

Dr  Mcnucfta  Birenbaum 
Educational  Testing 
Service 

Print  cion.  NJ  06541 

Dr  Bruce  Bioso® 

Defense  Manpower  Data  Center 
uv  Pacif<  St 
Sane  1"A 

Mommy.  CA  93*00231 

Dr.  Gwyneth  Boodoo 
CJacauonal  Testing  Service 
Pnnceion.  NJ  06541 

Dr  Richard  L  Branch 
MO.  USMGPCOM.'MEPCT 
2>'4i  Green  Bay  Road 
North  Chicago.  IL  60064 

Dr  Robert  Brennan 
American  College  Testing 
Programs 
P  O  Boi  lrf 
lc«a  On.  IA  52243 

Dr  Da'td  V.  Budeseu 
Department  of  Prycboiogy 
'Jnncrsicy  of  Haifa 
Mount  Carmel.  Haifa  31999 
iSRAEL 

Dr  Gregory  Cende* 

CTH  MacM.ilaaMcCrew  Hii 
Garden  Road 
Monterey.  CA  93^*0 

Dr  Paul  R.  Chatelier 

Perceptromca 

;<ul  Norh  Ft  Myer  Dr. 

Sjite  i/OO 

Arf.ngtoh.  VA  22209 


Dr.  Susan  Chipman 
Cognitive  Science  Program 
Office  of  Naval  Research 
600  North  Otwxy  Sl 
Arlington*  VA  22217-5000 

Dr.  Raymond  E.  Chnatal 
UES  LAMP  Severvce  Advisor 
AL/HRMiL 
Brooks  AFB,  TX  76235 

Dr.  Norman  G*ff 
Department  of  Psychology 
Unrv.  of  So-  California 
Los  AngcfeeCA  90069-1061 

Director 

Ut  Soence*.  Code  1142 
Office  of  Naval  Research 
Arlington*  VA  22217-5000 

Commanding  Officer 
Naval  Research  La  bora  Lory 
Code  4827 

Washington.  DC  20375-5000 

Dr.  John  M  Comwei 
Department  of  Psychology 
fO  Psychology  Program 
Tulane  University 
New  Orleans.  LA  70118 

Dr.  William  Craoo 
Department  of  Psycho*o0 
Tens  A  AM  University 
Codege  Sta<»®\  TX  77843 

Dr.  Linda  Curran 

Defense  Manpower  Data  Center 

Suite  400 

1600  Wilson  BVd 

Roatiyn.  VA  22209 

Dr  Timothy  Davey 

Arne  vc an  College  Testing  Program 

P.0  Bo*  168 

Iowa  Cey.  IA  52243 

Dr.  Charles  E.  Davw 
Educational  Tesimg  Service 
Mad  Stop  22  T 
Princeton,  NJ  06541 

Dr  Ralph  J.  DeAyaJa 
Measurement.  Statist** 
and  Evaluation 
Ben>amio  EMdg^  Rm.  1230F 
Unwemty  of  Maryland 
Codege  Part.  MD  20742 

Dr.  Sharon  Derry 
Florida  State  Univerairy 
Department  of  Psychologr 
TaAahaasce.  FL  32306 

Hn-Ki  Dong 
Bellcore 
6  Corporate  R 
RM:  PYA-1K207 
P.O.  Boa  1320 
Pwcauway.  NJ  06655-133) 

Dr.  Neil  Dorans 
Educational  Testing  Service 
Princeton,  NJ  06541 

Dr.  Friu  Draigow 
Urmeney  of  lllinou 
Department  of  Psychology 
603  E  Darnel  St 
Champaign,  IL  61820 

Defense  Technical 
Information  Center 
Cameron  Station,  BkJg  5 
Alexandria,  VA  22314 
(2  Copies) 


Dr.  Richard  Duran 
Graduate  School  of  Education 
Unrvcimty  of  California 
Santa  Barbara.  CA  93106 

Dr.  Susan  Fmbrcuon 
University  of  Kansas 
Psychology  Department 
426  Fraser 
Lawrence,  KS  6<4M5 

Dr.  George  Engelhard.  Jr. 
Division  of  Educational  Studies 
Emory  University 
210  Fohhumc  Bldg 
AtlaniA  GA  30322 

ERIC  Facility- Acquisitions 
2440  Research  BNd-,  Suite  550 
Rockville.  MD  20650-3238 

Dr.  Marshall  J.  Farr 
Farr-S'ght  Co. 

2520  North  Vernon  Street 
Arlington.  VA  22207 

Dr.  l>eonard  Feldt 
Lindquist  Cemef 
for  Measurement 
University  of  Iowa 
Iowa  City.  IA  52242 

Dr.  Richard  L  Ferguson 
American  College  Testing 
PO  Box  »68 
Iowa  City.  I A  52243 

Dr.  Gerhard  Fischer 
Liebiggassc  5 
A  1010  Vienna 
AUSTRIA 

Dr.  Myron  Fischl 
U.S.  Army  Headquarters 
DAPE-HR 
The  Pentagon 

Washington.  DC  20310-0.VO 

Mr.  Paul  Foley 

Navy  Personnel  RAD  Center 

San  P*ego.  CA  92 1524*00 

Char.  Department  of 
Computer  Science 
George  Mason  University 
Fairfax.  VA  ::o30 

Dr.  Robert  D.  Gibbons 
University  of  Illinois  at  Chicago 
NPI  909 A.  M/C  913 
912  South  Wood  Street 

Chicago,  IL  <£>612 

Dr.  Janice  Gifford 
Unrverxiry  of  Massachusetts 
School  of  Education 
Amherst.  MA  01003 

Dr.  Robert  Glaser 
Learning  Research 

A  Development  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh.  PA  15260 

Dr.  Susan  R.  Goldman 
Peabody  College  Box  45 
Vanderbilt  University 
Nashville,  TN  37203 

Dr.  Timothy  Goldsmith 
Department  of  Psychology 
University  of  New  Mexico 
Albuquerque  NM  87131 


01/27  *i2 


Dr  She  me  Goa 
aFHALMOMJ 
Brooks  AFB.  TX  78Z35-5M)1 

Dr  Ben  Green 
John*  Hopktrw  University 
Department  of  Psychology 
Charles  A  3-lth  Street 
Baltimore.  MD  21218 

Prof.  Edward  Haertel 
Sshooi  of  Education 
Stanford  University 
Stanford.  CA  94305- 30*6 

Dr.  Ronald  1C  Hambleioo 
University  of  Massachusetts 
Laboratory  of  Psychometric 
and  Evaluative  Research 

Hill*  South.  Room  152 
Amherst  MA  01003 

t>.  Dctwyn  Hanuach 

Unnerwy  of  lllino«* 

Gerry  Drive 
Champaign.  1L  **1820 

Dr  Patrick  R  Hamson 
Computer  Science  Department 
US  Naval  Academy 
Annapolis.  MD  21402-5002 

M*  Rebecca  Hetier 

Navy  Personnel  R4D  Center 

Code  13 

San  D.ego.  CA  92152-6800 

Dr  Thomas  M.  Hirsch 
ACT 

P  O  Bex  168 
Iowa  C.rv.  1A  sro 


Dr.  Kumar  Joag-dev 
University  of  Ulmoia 
Department  of  Su  u»bca 
101  Mini  Hail 
725  South  Wnght  Sired 
Champaign.  1L  61820 

Professor  Douglas  K  Jones 
Graduate  School  of  Management 
Rutgers,  The  State  University 
of  New  Jersey 
Newark.  N)  0?102 

Dr.  Brian  Junker 
Camegie  Melloo  University 
Department  of  Statute* 

P»'  'urgh.  PA  15213 

..  Marcel  Just 
Carnegie  Mellon  University 
Department  of  Psychology 
Schenley  Part 
Pittsburgh.  PA  15213 

Dr.  J.  L  Kaiun 
Code  442/JK 

Naval  Ocean  System*  Center 
San  Dtcgo.  CA  92152  5000 

Dr.  Michael  Kaplan 
Office  of  Banc  Research 
US  Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria.  VA  22333-5600 

Dr.  Jeremy  Kilpatrict 
Department  of 

Mathematics  Education 
105  Adertvotd  MaH 
Unrversiry  of  Georgia 
Athens,  GA  >0602 


Richard  Lamcnnan 
Commandant  (G-PWP) 

US  Coast  Guard 
2100  Second  St.  SW 
Washington.  DC  20593-0001 

Dr.  Michael  Levine 
Educational  Psychology 
2 10  Education  Bldg. 

1310  South  Sunh  Street 
Unrversiry  of  1L  at 
Urt  ana  -Champaign 
Champaign.  IL  61820-6990 

Dr.  Charles  Lewi* 

Educational  Testing  Service 
Pnnceton,  NJ  08541-0001 

Mr.  Hun-hung  U 
University  of  lllinoi* 
Department  of  Statistics 
101  lll.m  Had 
725  South  Wnght  Sc 
Champaign.  )L  61820 

Library 

Naval  Training  Systems  Cem.r 
123*0  Research  Parkwa> 
Orlando  FL  3282**- 32 J4 

Dr  Marcia  C  Lnn 
Graduate  School 
of  Education.  EM  ST 
Telman  HaU 
UnAers.ty  of  Cahfomu 
Berkrto  CA  <*i*20 

Dr  R  *»n  L  L*wi 
Campus  EV'S  2*4 
Un*vers.i>  of  Oc*c  -ado 
Boulder  CO  *  >:  *9 


Dr  Paul  W  HotUnd 
Educational  Testing  Service.  21-T 
Rote dok  Road 
Pnnceton.  NJ  0854] 

Prof  Luu  F  Homke 
Insiiiut  fur  Psycholog* 

RATH  Aachen 
Jaegcrstrasse  P'19 
D  '  1*4)  Aachen 
WEST  GERMANY 

Ms  Julia  S  Hough 
Cjmbr-dge  University  Presa 
-sei  West  20 h  Sired 
Sew  York.  NY  10011 

Dr  A-iliam  Hm**8 
Chief  Sciential 
AFHRLCA 

Br.R'ks  AFB.  TX  78235-5601 

Dr  Huynh  Huynh 
Co"ege  of  Education 
Unit  of  South  Caroline 
Columbia.  SC  29208 

Dr  Martin  J  tppef 
Center  for  *he  Study  of 
Lducat-on  and  Instruction 
Le.Jen  L’nvefSity 
P  O  Box  95  55 
23*  *t  RB  Lnden 
THE  NETHERLANDS 

Dr  Robert  Jannarone 
fire  and  Computer  Eng  Depc 
Uniterurv  of  South  Carolina 
Columbia.  SC  29208 


Ma.  Hac-Rim  Kim 
University  of  Ittmoia 
Department  of  Statiaiica 
101  llltm  HaN 
723  South  Wnght  Sc 
Champaign.  IL  61820 

Dr.  Jwa-krun  Kim 
Department  of  Psychology 
Middle  Tennessee  State 
University 

Murfreesboro.  TN  37132 

Dr.  Sung- Moon  VLm 
K£DI 

92-4  Umyeon  Dong 
Seocho-Gvt 

Seoul 

SOUTH  KOREA 

Dr  G.  Gage  Kmgsbury 
Portland  Public  School* 

Research  and  Evaluation  Department 
501  North  Dixon  Sued 
P.  O.  Box  3107 
Portland.  OR  97209-3107 

Or.  WiHum  Koch 
Bos  724A  Mea*  and  Ev*l  Cir 
University  of  Texaa-Auasm 
Austin.  TX  78703 

Dr.  James  Kraau 
Computer -baaed  Education 
Research  Laboratory 
University  of  lUmoa 
Urban*.  IL  61801 

Dr.  Patnct  Kyhonen 
AFHRL/MOEL 
Brook*  AFB,  TX  78235 


LogKon  Inc  V  *  -bin  ' 
Tactical  sod  ‘*'•*<‘'*1  Sut-i 
Division 

PO  Bo.  85158 

San  Diego.  C A  92138-5148 

Dr.  Richard  Luecht 
ACT 

P  O  Box  168 
Iowa  C.ry,  1A  52243 

Dr  George  B  Macready 
Department  of  Measurement 
Statistics  it  Evaluation 
College  V  Education 
Un*v*r».t>  of  Msrytand 
Collett  Part.  MD  20-42 

Dr  Evan*  Mandes 
George  Mason  Unwers  *v 
4*no  Unners*ry  Dme 
Fairfax.  VA  22030 

Dr  Paul  Mayberry 
Center  for  Naval  AnaNi-s 
4401  Ford  Avenue 
PO  Box  162b8 
Alexandria  VA  22302  f'/'^ 

Dr.  James  R.  McBnde 
HumRRO 

6J.V)  Eimhurst  Dnve 
San  Diego.  CA  92120 

Mr  Christopher  MrCusker 
University  of  lllinou 
Department  of  Psycholofc- 
603  E.  Darnel  Sc 
Champaign.  IL  61820 


M*.  Carolyn  Laney 
1515  Spence  nolle  Rod 
Spencerville.  MD  20868 


Dr  Robert  McKinley 
Educational  Testing  Service 
Pnnceton,  NJ  08541 


Or  Joseph  McLacbtan 
N*vy  Personnel  Research 
and  Development  Center 
Code  14 

San  Dx-go.  CA  92152  6*00 
Alan  Mead 

c'o  Or  Michael  Lcwne 
Educational  Psychology 
210  Education  BUg, 

University  of  IDinoii 
Champaign  IL  61801 

Dr  Timothy  Millet 
ACT 

P  O  Ho«  1*6 
Iowa  City.  IA  52243 

Or.  Robert  Mulevy 
Educational  Tesung  Service 
Princeton.  NJ  06541 

Dr.  No  Molcnar 

F«<ultex  Sonak  Wetenschappen 

R.;kiuntvcrs»tc*  Groningen 

Grotc  kruimraat  2/1 

9?J2  TS  Groningen 

The  NETHERLANDS 

Or  £.  Muraki 
Educational  Tearing  Service 
Ro*cJalc  Road 
Princeton  NJ  06541 

Or  Ratna  Sandakumar 
Edjcaoonal  Studies 
W.ltjrd  HaiL  Room  21JE 
Un«v entry  of  Delaware 
*»c%»art,  DH  J97]6 

Academic  Prog*.  A  Research  Branch 
Njul  Technical  Training  Command 
CoJe  N-62 
N  AS  Memphi*  (75) 

Mtftngton.  TV  50654 

Or  N*  Alan  Nicewindcr 
U  nnern ry  of  Oklahoma 
Depan  mem  of  P*ycbo*T£y 
Norman  Ok  73071 

Head.  Personnel  Systems  Department 
NPRDC  (Code  .2) 

S^n  O-ego.  CA  92152-6800 

D  r  eel  or 

Training  Svstema  Department 
NPRDC  (Code  14) 

S«n  C*ego.  CA  92152-6800 

Library.  NPRDC 
Code  Oil 

,n  Otego.  CA  92152-6600 
L.  ho  nan 

Naval  Center  for  Applied  Research 
m  Artificial  Intelligence 
Na»a)  Research  Laboratory 
CoJe  5510 

\k*»h.ngtorv  DC  20? 75  5000 

Office  of  Naval  Research. 

Code  1142CS 
N*  V  Quincy  Street 
Arl.ngton.  VA  21217-5000 
(b  Cop*«) 

Special  Assistant  for  Research 
Management 

Chief  of  Naval  Personnel  (PERS-OUT) 
Department  of  the  Njvy 
W0h.ngion.  DC  203592000 

Dr  JixJiih  Orasanu 

Mail  Stop  23*1 

NASA  Ames  Research  Center 

Moffett  Field.  CA  *<035 


Dr.  Peter  J.  Psshley 
Educational  Testing  Service 
Rosedak  Road 
Princeton,  NJ  06541 

Wayne  M.  Patience 
American  Council  on  Education 
GED  Testing  Service,  Suite  20 
One  Dupont  Cirde,  NW 
Washington.  DC  20036 

Dept  of  Administrative  Sciences 
Code  54 

Naval  Poet  graduate  School 
Monterey.  CA  93943-5026 

Dr.  Peter  Pirohi 
School  of  Education 
University  of  California 
Berteky.  CA  94720 

Dr.  Mar*  D.  Reckase 
ACT 

P.  O.  Boa  166 
Iowa  City.  LA  52243 

Mr  Steve  Rebe 
Department  of  Psychology 
University  of  California 
Riverside,  CA  92521 

Mr.  Louis  Rouaaot 
University  of  Illinois 
Department  of  Statistics 
101  Uwv  Hall 

775  <•  10th  Wnght  Sc 
smpaign.  IL  61620 

Dr,  Donald  Rubin 
Statistics  Department 
Science  Center.  Room  60S 
1  Chrford  Street 
Harvard  University 
Cambridge,  MA  02136 

Dr.  Fumiko  Samcjima 
Department  of  Paychotop 
University  of  Tennessee 
3 10B  Auatm  Pray  Bldg. 
Knoxville.  TN  379^6-0900 

Dr.  Mary  Schrsa 
4100  Partside 
Carlsbad,  CA  92006 

Mr.  Robert  Semmes 
N216  Elliott  Hafl 
Department  of  Psychology 
University  of  Minnesota 
Minneapolis.  MN  55455-0344 

Dr.  Valerie  L  Shalm 
Department  of  Industrial 
Engineering 

State  Uoneraity  of  New  Yort 
342  Lawrence  D  Bell  HaD 
Buffalo.  NY  14260 

Mr.  Richard  3.  Shavetsots 
Graduate  School  of  Education 
Unnersity  of  California 
Santa  Barbara,  CA  93106 

Me  Kathleen  Sheehan 
Educational  Testing  Service 
Princeton.  N3  06541 

Dr.  Kazuo  Shigemasu 
7-9-24  Kugenuma-Kaigan 
Fujiuwa  251 
JAPAN 

Dr.  Randall  Shumaker 
Naval  Research  Laboratory 
Code  5500 

4555  Overtook  Avenue,  S-W. 
Washington  DC  20375-5000 


Dr.  Judy  Spray 
ACT 

P  O.  Boa  166 
!o*a  City.  IA  $2243 

Dr.  Martha  Stocking 
Educational  Testing  Service 
Princeton,  NJ  06541 

Dr.  William  Stout 
University  of  Illinois 
Department  of  Statistics 
101  lllim  Hall 
725  South  Wnght  Sl 
Champaign  IL  61620 

Dr.  Kikumi  Tatauoka 
Educational  Testing  Service 
Mail  Stop  03-T 
Princeton.  NJ  06541 

Dr.  David  Thissen 
Psychometric  Laboratory 
CB#  3270,  Davie  Hall 
Unrversity  of  North  Carolina 
Chapel  Hill.  NC  275993270 

Mr.  Thomas  J.  Thomas 
Federal  Express  Corporation 
Human  Resource  Development 
3035  Director  Row.  Suite  501 
Memphu,  TN  36131 

Mr.  Gary  Thomassoo 
University  of  Illinois 
Educational  Psychology 
Champaign.  IL  61620 

Dr.  Howard  Warner 
Educational  Testing  Service 
Pnncctoa  NJ  08541 

Elizabeth  Wald 

Office  of  Naval  Technology 

Code  227 

800  North  Quincy  Street 
Arlington  VA  22217-5000 

Dr.  Michael  T.  Waller 
University  of 
M'i$<on$>n-  M  »h*  auk  ec 
Educational  Psychology  Dept. 
Bo*  413 

Milwaukee.  91  53201 

Dr.  Ming- Me*  W«ng 
Educational  Testing  Service 
Mail  Stop  03-T 
Pnnceton  NJ  C654; 

Dr.  Thomas  A.  Warm 
FAA  Academy 
P.0  Bo*  25062 
Oklahoma  City,  OK  73115 

Dr  David  J.  Wciaa 
N'660  Elliott  Hall 
University  of  Minnesota 
75  £.  River  Road 
Minneapolis.  MN  5S45'-0*4i 

Dr.  Douglas  Wetzel 
Code  15 

Navy  Personnel  RAD  Center 
San  Diego.  CA  92152  <•&*) 

German  Military 
Representative 
Persona  1st  a  mmamt 
Koelner  Sir.  262 
D  5000  Koeln  90 
WEST  GERMANY 


I  ’7  *0 


Dr  Dav.d  W.lcy 
School  o(  Fv'i^auoo 
and  ixul  i’oUcy 

NonNnum  Umtcmiy 

[  »jn»i>>n.  IL 

f)r  llrLKc  WiMuma 
D*  |unmcni  of  Educational 
Paychoiopr 
Uniser»<»y  »>f  lllinon 
Urtvina.  IL  0I8OI 

;v  M.irk  WiistMi 
S  h.»il  of  Fdocaiton 

of  California 
IWrkck>.  CA 

IV  Tufenc  Winograd 
Depart  meni  of  Payvhofogy 
Fmory  Unneruty 
AiUnu.  C.  A  W22 

IV  M.inm  F.  WtskofT 
IM  RSI.REC 
vw  l'.KifiC  St.  Sum*  4V'6 
Montcrry  CA  93<MO 

Mr  John  M.  Wolfe 

N.o>  Personnel  RAD  Center 

S.»n  D.rgo.  CA  9215:  ^0 

Dr  Kenuro  Yamamoto 

I.v.rr 

[J  actional  Teiung  Service 
Ro*ed.»k  Road 
I'r. melon.  NJ  OftMl 

Mi  Du.mli  Yan 
!  J-cai.ohal  Testing  Secure 
Pnmeiorv  NJ 

Dr  VVrndy  Yen 

cm  MvC.«*  h.« 

;v  Monte  Rnearrh  Part 

M.mfercy.  CA  9.VW0 

IV  .K>*cph  L  Young 
kj»i.wul  Science  Foundation 
K  n'iT)  VO 

.>»••  C  Si  n  et  N  W. 

U  ,«*hingi.in.  DC  20?50 


